Summary

  • The goal of this project is to use machine learning to predict Ag archiving capacity (or ability) in LECs from naive mice.
  • For a proof of principle analysis, Ag-tracking data for d14 cLECs was used to train a random forest classifier to predict Ag status
  • Using this model we defined a gene program that correlates with Ag status at various timepoints
  • Archiving “competent” cLECs can be predicted in the CHIKV LN scRNA-seq data
  • There is a reduction in archiving-competent cLECs in CHIKV-infected mice and a broad downregulation of the Ag-archiving gene program
  • The central goal for this project is to optimize the model (e.g. expand to other cell types) and use it to assess archiving capacity in samples that did not receive an Ag-tag (e.g. other published datasets). We can then identify perturbations/treatments etc that are predicted to impair archiving.


Classifying Ag-high

Ag-low and -high cells were identified by separately clustering each LEC subset for each sample into two groups based on Ag-score. For the 6wk-3wk sample, the 3wk Ag score is used. Ag-low/high classifications used for the analysis are shown below.


A random forest classifier was trained using data for d14 cLECs. The model was then used to predict Ag-high cells in the other Ag datasets.

The fraction of cells belonging to each predicted Ag group is shown on the left for cLECs from each sample. The fraction of true Ag-low, true Ag-high, and false-positive Ag-high cells (high-pred) is shown on the right.

  • The model is fairly accurate in predicting Ag-high cells in the training and test data (d14 cLECs), but does not perform as well when predicting Ag-low cells, this can be improved with more optimization
  • Since we want to identify gene signatures that are expressed in naive mice and continue to be expressed after Ag levels have fallen, we expect to observe an increasing fraction of false positive Ag-high cells for the later timepoints.


Model accuracy was assessed for each LEC subset. F1 scores are shown for different combinations of testing and training data.

  • Ag-high cLECs, collecting, and fLECs are easiest to predict. All models show high F1 scores when tested using these LEC subsets.
  • Ag-high Ptx3 LECs and BECs are most difficult to accurately predict. This is expected since these cell populations have the lowest Ag signal and the fewest Ag-high cells.
  • The F1 score is not always highest when the models are tested using the training cell type. This is not necessarily surprising since the models were selected using several metrics in addition to the F1 score.




Ag modules

Expression of the top upregulated (top) and downregulated (bottom) gene modules that are most predictive of Ag signal are shown below.

  • There is a notable correlation between the expression of these genes and the Ag class
  • False positive Ag high cells (high-pred) show an intermediate level of expression that falls roughly between the true Ag-low and true Ag-high cells.
  • The false positive Ag high cells are potentially cells that are archiving-competent but have now lost/released most Ag at the later timepoints


UMAP projections show Ag-high module expression (top), true Ag-low vs true Ag-high (middle), and false-positive Ag-high (high-pred) vs true Ag-high (bottom).

  • False positive Ag-high cells (high-pred) show strong overlap with true Ag-high cells

cLEC




Collecting




fLEC




Ptx3_LEC





Expression of Ag-high and Ag-low gene modules is compared for the 6wk, 3wk, and 6wk-3wk samples. Module expression is shown for Ag double-positive cells from the 6wk-3wk mouse (double-high), cells positive for a single Ag-tag (single-high), and Ag-high cells from the 6wk or 3wk mouse (6wk-high, 3wk-high). All Ag-low cells are plotted as a single group.

  • When compared to the 6wk and 3wk mice, double positive Ag-high cLECs show slightly higher and lower expression of the Ag-high and Ag-low modules, respectively.
  • This effect is subtle and requires more investigation, but it suggests that double-positive cells are better equipped to archive antigen.




Ag features

Mean expression in cLECs is shown on the left for genes from the Ag-high module for true Ag-low, true Ag-high, and false positive Ag-high (predicted, high-pred) cells. Expression is shown on the right for select top features.

Points show median expression, grey bars show interquartile range, dotted line shows the trend, and arrows indicate the gene is significantly up or down regulated when compared to Ag-low cells.

Collecting-high




Ptx3_LEC-high




cLEC-high




fLEC-high




Collecting-low




Ptx3_LEC-low




cLEC-low




fLEC-low




Ag archiving in CHIKV

The fraction of cells predicted to be Ag-high (i.e. archiving competent) is shown below for each biological replicate. p-values < 0.05 are shown.

  • There is a reduction in archiving-competent cells in CHIKV-infected LN samples


Expression of the Ag-high (top) and Ag-low (bottom) gene modules is shown below for each predicted Ag class for the 24 hpi timepoint.

  • Cells predicted to be archiving-competent show upregulation of the Ag-high gene module
  • The Ag-low gene module shows similar expression between samples


Mean expression is shown for genes from the Ag-high module for mock- and CHIKV-infected mice from the 24 hpi timepoint.

  • CHIKV-infected mice broadly downregulate Ag-high modules

Collecting




cLEC




fLEC




Session info

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
##  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
## 
## time zone: America/Denver
## tzcode source: system (glibc)
## 
## attached base packages:
##  [1] stats4    tools     grid      stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] ggtree_3.8.2          GOSemSim_2.26.1       org.Mm.eg.db_3.17.0  
##  [4] AnnotationDbi_1.62.2  IRanges_2.34.1        S4Vectors_0.38.2     
##  [7] Biobase_2.60.0        BiocGenerics_0.46.0   msigdbr_7.5.1        
## [10] enrichplot_1.20.3     clusterProfiler_4.8.3 caret_6.0-94         
## [13] lattice_0.21-8        furrr_0.3.1           future_1.33.0        
## [16] ranger_0.15.1         rsample_1.2.0         harmony_1.0.3        
## [19] biomaRt_2.56.1        openxlsx_4.2.5.2      MetBrewer_0.2.0      
## [22] rdrop2_0.8.2.1        ggtext_0.1.2          ggtrace_0.2.0        
## [25] qs_0.25.5             vroom_1.6.3           M3Drop_1.26.0        
## [28] numDeriv_2016.8-1.1   djvdj_0.1.0           gtools_3.9.4         
## [31] clustifyrdata_1.1.0   here_1.0.1            presto_1.0.0         
## [34] data.table_1.14.8     Rcpp_1.0.11           devtools_2.4.5       
## [37] usethis_2.2.2         ComplexHeatmap_2.16.0 patchwork_1.1.3      
## [40] scales_1.2.1          boot_1.3-28.1         clustifyr_1.12.0     
## [43] mixtools_2.0.0        broom_1.0.5           colorblindr_0.1.0    
## [46] colorspace_2.1-0      xlsx_0.6.5            RColorBrewer_1.1-3   
## [49] ggrepel_0.9.3         cowplot_1.1.1         knitr_1.44           
## [52] gprofiler2_0.2.2      SeuratObject_4.1.4    Seurat_4.4.0         
## [55] ggforce_0.4.1         ggbeeswarm_0.7.2      lubridate_1.9.3      
## [58] forcats_1.0.0         stringr_1.5.0         dplyr_1.1.3          
## [61] purrr_1.0.2           readr_2.1.4           tidyr_1.3.0          
## [64] tibble_3.2.1          ggplot2_3.4.3         tidyverse_2.0.0      
## 
## loaded via a namespace (and not attached):
##   [1] igraph_1.5.1                ica_1.0-3                  
##   [3] plotly_4.10.2               Formula_1.2-5              
##   [5] zlibbioc_1.46.0             tidyselect_1.2.0           
##   [7] bit_4.0.5                   doParallel_1.0.17          
##   [9] clue_0.3-65                 rjson_0.2.21               
##  [11] blob_1.2.4                  urlchecker_1.0.1           
##  [13] S4Arrays_1.0.6              parallel_4.3.1             
##  [15] png_0.1-8                   cli_3.6.1                  
##  [17] ggplotify_0.1.2             goftest_1.2-3              
##  [19] kernlab_0.9-32              densEstBayes_1.0-2.2       
##  [21] uwot_0.1.16                 shadowtext_0.1.2           
##  [23] curl_5.0.2                  mime_0.12                  
##  [25] evaluate_0.22               tidytree_0.4.5             
##  [27] leiden_0.4.3                stringi_1.7.12             
##  [29] pROC_1.18.4                 backports_1.4.1            
##  [31] XML_3.99-0.14               httpuv_1.6.11              
##  [33] magrittr_2.0.3              rappdirs_0.3.3             
##  [35] splines_4.3.1               prodlim_2023.08.28         
##  [37] RApiSerialize_0.1.2         ggraph_2.1.0               
##  [39] sctransform_0.4.0           sessioninfo_1.2.2          
##  [41] DBI_1.1.3                   jquerylib_0.1.4            
##  [43] withr_2.5.1                 class_7.3-22               
##  [45] rprojroot_2.0.3             lmtest_0.9-40              
##  [47] bdsmatrix_1.3-6             tidygraph_1.2.3            
##  [49] htmlwidgets_1.6.2           fs_1.6.3                   
##  [51] SingleCellExperiment_1.22.0 segmented_1.6-4            
##  [53] labeling_0.4.3              MatrixGenerics_1.12.3      
##  [55] reticulate_1.32.0           zoo_1.8-12                 
##  [57] XVector_0.40.0              timechange_0.2.0           
##  [59] foreach_1.5.2               fansi_1.0.4                
##  [61] caTools_1.18.2              timeDate_4022.108          
##  [63] irlba_2.3.5.1               gridGraphics_0.5-1         
##  [65] ellipsis_0.3.2              lazyeval_0.2.2             
##  [67] yaml_2.3.7                  survival_3.5-5             
##  [69] scattermore_1.2             crayon_1.5.2               
##  [71] RcppAnnoy_0.0.21            progressr_0.14.0           
##  [73] tweenr_2.0.2                later_1.3.1                
##  [75] ggridges_0.5.4              codetools_0.2-19           
##  [77] base64enc_0.1-3             GlobalOptions_0.1.2        
##  [79] profvis_0.3.8               KEGGREST_1.40.1            
##  [81] bbmle_1.0.25                Rtsne_0.16                 
##  [83] shape_1.4.6                 filelock_1.0.2             
##  [85] foreign_0.8-84              pkgconfig_2.0.3            
##  [87] xml2_1.3.5                  GenomicRanges_1.52.0       
##  [89] aplot_0.2.2                 spatstat.sparse_3.0-2      
##  [91] ape_5.7-1                   viridisLite_0.4.2          
##  [93] xtable_1.8-4                plyr_1.8.8                 
##  [95] httr_1.4.7                  globals_0.16.2             
##  [97] hardhat_1.3.0               pkgbuild_1.4.2             
##  [99] beeswarm_0.4.0              htmlTable_2.4.1            
## [101] checkmate_2.2.0             nlme_3.1-162               
## [103] loo_2.6.0                   HDO.db_0.99.1              
## [105] dbplyr_2.3.4                digest_0.6.33              
## [107] Matrix_1.6-1.1              farver_2.1.1               
## [109] tzdb_0.4.0                  reshape2_1.4.4             
## [111] ModelMetrics_1.2.2.2        yulab.utils_0.1.0          
## [113] viridis_0.6.4               rpart_4.1.19               
## [115] glue_1.6.2                  cachem_1.0.8               
## [117] BiocFileCache_2.8.0         polyclip_1.10-6            
## [119] Hmisc_5.1-1                 generics_0.1.3             
## [121] Biostrings_2.68.1           mvtnorm_1.2-3              
## [123] parallelly_1.36.0           pkgload_1.3.3              
## [125] statmod_1.5.0               pbapply_1.7-2              
## [127] SummarizedExperiment_1.30.2 gson_0.1.0                 
## [129] utf8_1.2.3                  gower_1.0.1                
## [131] graphlayouts_1.0.1          StanHeaders_2.26.28        
## [133] gridExtra_2.3               shiny_1.7.5                
## [135] lava_1.7.2.1                GenomeInfoDbData_1.2.10    
## [137] RCurl_1.98-1.12             memoise_2.0.1              
## [139] rmarkdown_2.25              downloader_0.4             
## [141] RANN_2.6.1                  stringfish_0.15.8          
## [143] spatstat.data_3.0-1         rstudioapi_0.15.0          
## [145] cluster_2.1.4               QuickJSR_1.0.6             
## [147] rstantools_2.3.1.1          spatstat.utils_3.0-3       
## [149] hms_1.1.3                   fitdistrplus_1.1-11        
## [151] munsell_0.5.0               rlang_1.1.1                
## [153] GenomeInfoDb_1.36.3         ipred_0.9-14               
## [155] circlize_0.4.15             mgcv_1.8-42                
## [157] xfun_0.40                   e1071_1.7-13               
## [159] remotes_2.4.2.1             recipes_1.0.8              
## [161] iterators_1.0.14            matrixStats_1.0.0          
## [163] reldist_1.7-2               abind_1.4-5                
## [165] rstan_2.26.23               treeio_1.24.3              
## [167] rJava_1.0-6                 bitops_1.0-7               
## [169] ps_1.7.5                    promises_1.2.1             
## [171] inline_0.3.19               scatterpie_0.2.1           
## [173] RSQLite_2.3.1               qvalue_2.32.0              
## [175] proxy_0.4-27                fgsea_1.26.0               
## [177] DelayedArray_0.26.7         GO.db_3.17.0               
## [179] compiler_4.3.1              prettyunits_1.2.0          
## [181] listenv_0.9.0               tensor_1.5                 
## [183] MASS_7.3-60                 progress_1.2.2             
## [185] BiocParallel_1.34.2         gridtext_0.1.5             
## [187] babelgene_22.9              spatstat.random_3.1-6      
## [189] R6_2.5.1                    fastmap_1.1.1              
## [191] fastmatch_1.1-4             vipor_0.4.5                
## [193] ROCR_1.0-11                 nnet_7.3-19                
## [195] gtable_0.3.4                KernSmooth_2.23-21         
## [197] miniUI_0.1.1.1              deldir_1.0-9               
## [199] htmltools_0.5.6             RcppParallel_5.1.7         
## [201] bit64_4.0.5                 spatstat.explore_3.2-3     
## [203] lifecycle_1.0.3             zip_2.3.0                  
## [205] processx_3.8.2              callr_3.7.3                
## [207] xlsxjars_0.6.1              sass_0.4.7                 
## [209] vctrs_0.6.3                 spatstat.geom_3.2-5        
## [211] DOSE_3.26.2                 ggfun_0.1.3                
## [213] sp_2.0-0                    future.apply_1.11.0        
## [215] entropy_1.3.1               bslib_0.5.1                
## [217] pillar_1.9.0                gplots_3.1.3               
## [219] jsonlite_1.8.7              GetoptLong_1.0.5